Using Bi-sets That Characterize Bi-partitions as Features for Classification: an Application to Microarray Data Analysis

نویسندگان

  • Ivica Slavkov
  • Ruggero Pensa
  • Sašo Džeroski
چکیده

As part of the efforts for building a unified Inductive Databases (IDBs) framework, an important step would be to find a way to combine discovered local patterns from the data with global models of predictive nature. In this paper, we investigate the possibility of using bi-sets (local patterns) as features during classification. When searching for bi-sets from Boolean data, despite reasonable frequency constraints, a large number of sets are usually generated. In order to discern which of these bi-sets could be potentially useful as features for classification, we are using a scoring function which includes as parameters the bi-sets coverage and size. After a feature construction process, we perform an experimental evaluation on Huntington’s disesase (HD) microarray data. We apply Predictive Clustering Trees for the problem of distinguishinh between HD and healthy subjects and also for determining the stage of the development of the disease.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Diagnosis of Diabetes Using an Intelligent Approach Based on Bi-Level Dimensionality Reduction and Classification Algorithms

Objective: Diabetes is one of the most common metabolic diseases. Earlier diagnosis of diabetes and treatment of hyperglycemia and related metabolic abnormalities is of vital importance. Diagnosis of diabetes via proper interpretation of the diabetes data is an important classification problem. Classification systems help the clinicians to predict the risk factors that cause the diabetes or pre...

متن کامل

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

Diagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets

With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...

متن کامل

BI-MATRIX GAMES WITH INTUITIONISTIC FUZZY GOALS

In this paper, we present an application of intuitionistic fuzzyprogramming to a two person bi-matrix game (pair of payoffs matrices) for thesolution with mixed strategies using linear membership and non-membershipfunctions. We also introduce the intuitionistic fuzzy(IF) goal for a choiceof a strategy in a payoff matrix in order to incorporate ambiguity of humanjudgements; a player wants to max...

متن کامل

Supporting bi-cluster interpretation in 0/1 data by means of local patterns

Clustering or co-clustering techniques have been proved useful in many application domains. A weakness of these techniques remains the poor support for grouping characterization. As a result, interpreting clustering results and discovering knowledge from them can be quite hard. We consider potentially large Boolean data sets which record properties of objects and we assume the availability of a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006